windowsFonts()$serif
[1] "TT Times New Roman"
$sans
[1] "TT Arial"
$mono
[1] "TT Courier New"
https://docs.google.com/presentation/d/19u5djgMsPLtxoM-rfAQuLyP89B4nYdW8uBhNAEJQoS8/edit?usp=sharing
Serif fonts “have an extra flourish that makes it look pretty for many people, but can clutter what is on the page and that’s what makes it harder to distinguish for people with visual disabilities than just having a very clean font with no extra bits and pieces around it.”
Option 1: Using your system’s fonts
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = bill_depth_mm)) +
geom_point() +
labs(title = "Penguins from the Palmer Archipelago",
x = "Bill Length (mm)",
y = "Bill Depth (mm)") +
theme(plot.title = element_text(family = "sans", size = 28),,
axis.title.x = element_text(family = "serif", size = 28),
axis.title.y = element_text(family = "mono", size = 28)
)Option 2: showtext R package
font_add_google("Gochi Hand", "gochi")
font_add_google("Montserrat", "montserrat")
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = bill_depth_mm)) +
geom_point() +
labs(title = "Penguins from the Palmer Archipelago",
x = "Bill Length (mm)",
y = "Bill Depth (mm)") +
theme(plot.title = element_text(family = "montserrat", size = 40),
axis.title.y = element_text(family = "gochi", size = 40),
axis.title.x = element_text(family = "gochi", size = 40)
)subset()
Return subsets of vectors, matrices or data frames which meet conditions.
We want functions that accomplish one task!
We want functions with intuitive names!
filter()
select()
mutate()
summarize()
arrange()
group_by()
Brainstorm definitions for each verb
filter()
select()
mutate()
group_by()
summarize()
arrange()
The Pipe |>
Suppose we would like to study how the ratio of penguin body mass to flipper size differs across the species. Arrange the following steps into an order that accomplishes this goal (assuming the steps are connected with a |>).
A Different Context
You have data on each Cal Poly student for the 2020-2021 academic year. You are tasked with reporting how the number of CR/NC courses students took differed based on department.
| name | department | CRNC_f20 | CRNC_w21 | CRNC_s21 |
|---|---|---|---|---|
| Clarke, Justin | Business | 0 | 1 | 1 |
| Hernandez, Jorge | Biology | 1 | 0 | 1 |
| Meng, Huy | Business | 1 | 0 | 0 |
| el-Munir, Farhaan | Chemistry | 3 | 0 | 3 |
| Miller, Marissa | Liberal Studies | 0 | 2 | 1 |
| Crossley, David | Biology | 1 | 1 | 0 |
| Lampe, Bianca | Business | 0 | 0 | 1 |
| Padilla, Antonio | Political Science | 1 | 2 | 1 |
| Tan, Alexandra | Liberal Studies | 0 | 2 | 1 |
| Venkatesan, Patricia | Political Science | 1 | 1 | 1 |
Problem Statement:
Department totals for number of CR / NC courses
What data wrangling operations would you use?
What order would you use to accomplish this goal?
Step 1: Get totals for each student
Step 2: Get department totals
Step 3: Arrange the totals
# A tibble: 5 × 2
department dept_total
<chr> <int>
1 Political Science 7
2 Chemistry 6
3 Liberal Studies 6
4 Biology 4
5 Business 4
Often you are interested in one specific summary statistic!
pull()